Search CORE

255 research outputs found

Deletion/Substitution/Addition Algorithm for Partitioning the Covariate Space in Prediction

Author: Molinaro Annette
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 09/11/2004
Field of study

We propose a new method for predicting censored (and non-censored) clinical outcomes from a highly-complex covariate space. Previously we suggested a unified strategy for predictor construction, selection, and performance assessment. Here we introduce a new algorithm which generates a piecewise constant estimation sieve of candidate predictors based on an intensive and comprehensive search over the entire covariate space. This algorithm allows us to elucidate interactions and correlation patterns in addition to main effects

Collection Of Biostatistics Research Archive

Cross-Validating and Bagging Partitioning Algorithms with Variable Importance

Author: Molinaro Annette M.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 05/08/2005
Field of study

We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated

Collection Of Biostatistics Research Archive

Recommended from our members

An independently validated nomogram for isocitrate dehydrogenase-wild-type glioblastoma patient survival.

Author: Barnholtz-Sloan Jill S
Berger Mitchel S
Chunduru Pranathi
Cioffi Gino
Gittleman Haley
Molinaro Annette M
Sloan Andrew E
Publication venue: eScholarship, University of California
Publication date: 01/05/2019
Field of study

BackgroundIn 2016, the World Health Organization reclassified the definition of glioblastoma (GBM), dividing these tumors into isocitrate dehydrogenase (IDH)-wild-type and IDH-mutant GBM, where the vast majority of GBMs are IDH-wild-type. Nomograms are useful tools for individualized estimation of survival. This study aimed to develop and independently validate a nomogram for IDH-wild-type patients with newly diagnosed GBM.MethodsData were obtained from newly diagnosed GBM patients from the Ohio Brain Tumor Study (OBTS) and the University of California San Francisco (UCSF) for diagnosis years 2007-2017 with the following variables: age at diagnosis, sex, extent of resection, concurrent radiation/temozolomide (TMZ) status, Karnofsky Performance Status (KPS), O6-methylguanine-DNA methyltransferase (MGMT) methylation status, and IDH mutation status. Survival was assessed using Cox proportional hazards regression, random survival forests, and recursive partitioning analysis, with adjustment for known prognostic factors. The models were developed using the OBTS data and independently validated using the UCSF data. Models were internally validated using 10-fold cross-validation and externally validated by plotting calibration curves.ResultsA final nomogram was validated for IDH-wild-type newly diagnosed GBM. Factors that increased the probability of survival included younger age at diagnosis, female sex, having gross total resection, having concurrent radiation/TMZ, having a high KPS, and having MGMT methylation.ConclusionsA nomogram that calculates individualized survival probabilities for IDH-wild-type patients with newly diagnosed GBM could be useful to physicians for counseling patients regarding treatment decisions and optimizing therapeutic approaches. Free software for implementing this nomogram is provided: https://gcioffi.shinyapps.io/Nomogram_For_IDH_Wildtype_GBM_H_Gittleman/

eScholarship - University of California

Comparative Genomic Hybridization Array Analysis

Author: Molinaro Annette M.
Moore Dan H.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/04/2002
Field of study

At the present time, there is increasing evidence that cancer may be regulated by the number of copies of genes in tumor cells. Through microarray technology it is now possible to measure the number of copies of thousands of genes and gene segments in samples of chromosomal DNA. Microarray comparative genomic hybridization (array CGH) provides the opportunity to both measure DNA sequence copy number gains and losses and map these aberrations to the genomic sequence. Gains can signify the over-expression of oncogenes, genes which stimulate cell growth and have become hyperactive, while losses can signify under-expression of tumor suppressor genes, genes whose activity stops the formation of tumors. In order to better understand the progression of cancer and the differences between cancer and non-cancer tissue it is of great importance to fully understand what is happening at the chromosomal level. In the hopes of finding a genetic signature for subtypes of cancer, it is our intention to explore statistical approaches to array CGH data. The Waldman Lab at UCSF-CCC graciously allowed us to access data from their renal cancer study. This project was designed to determine whether microarray information on copy number of genes could be used to discriminate among four subtypes of renal cancer

Collection Of Biostatistics Research Archive

Factor analysis for survival time prediction with informative censoring and diverse covariates

Author: McCurdy Shannon
Molinaro Annette
Pachter Lior
Publication venue: 'Royal College of Obstetricians & Gynaecologists (RCOG)'
Publication date: 10/09/2019
Field of study

Fulfilling the promise of precision medicine requires accurately and precisely classifying disease states. For cancer, this includes prediction of survival time from a surfeit of covariates. Such data presents an opportunity for improved prediction, but also a challenge due to high dimensionality. Furthermore, disease populations can be heterogeneous. Integrative modeling is sensible, as the underlying hypothesis is that joint analysis of multiple covariates provides greater explanatory power than separate analyses. We propose an integrative latent variable model that combines factor analysis for various data types and an exponential proportional hazards (EPH) model for continuous survival time with informative censoring. The factor and EPH models are connected through low‐dimensional latent variables that can be interpreted and visualized to identify subpopulations. We use this model to predict survival time. We demonstrate this model's utility in simulation and on four Cancer Genome Atlas datasets: diffuse lower‐grade glioma, glioblastoma multiforme, lung adenocarcinoma, and lung squamous cell carcinoma. These datasets have small sample sizes, high‐dimensional diverse covariates, and high censorship rates. We compare the predictions from our model to three alternative models. Our model outperforms in simulation and is competitive on real datasets. Furthermore, the low‐dimensional visualization for diffuse lower‐grade glioma displays known subpopulations

Survival Point Estimate Prediction in Matched and Non-Matched Case-Control Subsample Designed Studies

Author: Kerlikowske Karla
Molinaro Annette M.
Moore Dan H.
van der Laan Mark J.
Publication venue: Collection of Biostatistics Research Archive
Publication date: 19/08/2005
Field of study

Providing information about the risk of disease and clinical factors that may increase or decrease a patient\u27s risk of disease is standard medical practice. Although case-control studies can provide evidence of strong associations between diseases and risk factors, clinicians need to be able to communicate to patients the age-specific risks of disease over a defined time interval for a set of risk factors. An estimate of absolute risk cannot be determined from case-control studies because cases are generally chosen from a population whose size is not known (necessary for calculation of absolute risk) and where duration of follow-up is not known (necessary for calculation of incidence). This problem can sometimes be overcome by using a nested case-control design. We have collected data on a National Cancer Institute funded population-based cohort study. This study contains a matched set of cases and controls within the cohort. This design is more cost-efficient than a full cohort study since expensive predictor variables (genomic measures, sex hormone levels, mammographic breast density) are measured on all of the cases, but on only a sample of the cohort who did not develop the outcome of interest (the controls). In addition, this design avoids the potential biases of conventional case-control studies that draw cases and controls from different populations. Importantly, the presence or absence of the outcome of interest has been established for the entire cohort within the same time period. The specifics of the sampling in our study do not adhere to the assumptions for absolute risk estimation methods previously developed in the literature. Here we introduce a novel method which provides locally efficient estimators to predict the absolute risk of a cohort from measures only taken on the matched case-control participants. The proposed method is evaluated using simulation studies and survival data from women with ductal carcinoma in situ, a non-invasive form of breast cancer. A generalization of the proposed method is related to other similar sampling designs such as nested case-control, case-cohort, and two-stage case-control

Collection Of Biostatistics Research Archive

Characterization of Metabolic, Diffusion, and Perfusion Properties in GBM: Contrast-Enhancing versus Non-Enhancing Tumor.

Author: Autry Adam
Cha Soonmee
Chang Susan M
Lupo Janine M
Maleschlijski Stojan
Molinaro Annette M
Nelson Sarah J
Phillips Joanna J
Roy Ritu
Publication venue: eScholarship, University of California
Publication date: 01/12/2017
Field of study

BackgroundAlthough the contrast-enhancing (CE) lesion on T1-weighted MR images is widely used as a surrogate for glioblastoma (GBM), there are also non-enhancing regions of infiltrative tumor within the T2-weighted lesion, which elude radiologic detection. Because non-enhancing GBM (Enh-) challenges clinical patient management as latent disease, this study sought to characterize ex vivo metabolic profiles from Enh- and CE GBM (Enh+) samples, alongside histological and in vivo MR parameters, to assist in defining criteria for estimating total tumor burden.MethodsFifty-six patients with newly diagnosed GBM received a multi-parametric pre-surgical MR examination. Targets for obtaining image-guided tissue samples were defined based on in vivo parameters that were suspicious for tumor. The actual location from where tissue samples were obtained was recorded, and half of each sample was analyzed for histopathology while the other half was scanned using HR-MAS spectroscopy.ResultsThe Enh+ and Enh- tumor samples demonstrated comparable mitotic activity, but also significant heterogeneity in microvascular morphology. Ex vivo spectroscopic parameters indicated similar levels of total choline and N-acetylaspartate between these contrast-based radiographic subtypes of GBM, and characteristic differences in the levels of myo-inositol, creatine/phosphocreatine, and phosphoethanolamine. Analysis of in vivo parameters at the sample locations were consistent with histological and ex vivo metabolic data.ConclusionsThe similarity between ex vivo levels of choline and NAA, and between in vivo levels of choline, NAA and nADC in Enh+ and Enh- tumor, indicate that these parameters can be used in defining non-invasive metrics of total tumor burden for patients with GBM

Directory of Open Access Journals

eScholarship - University of California